Search CORE

71 research outputs found

Model-based clustering of categorical data based on the Hamming distance

Author: Argiento Raffaele
Filippi-Mazzola Edoardo
Paci Lucia
Publication venue
Publication date: 09/12/2022
Field of study

A model-based approach is developed for clustering categorical data with no natural ordering. The proposed method exploits the Hamming distance to define a family of probability mass functions to model the data. The elements of this family are then considered as kernels of a finite mixture model with unknown number of components. Conjugate Bayesian inference has been derived for the parameters of the Hamming distribution model. The mixture is framed in a Bayesian nonparametric setting and a transdimensional blocked Gibbs sampler is developed to provide full Bayesian inference on the number of clusters, their structure and the group-specific parameters, facilitating the computation with respect to customary reversible jump algorithms. The proposed model encompasses a parsimonious latent class model as a special case, when the number of components is fixed. Model performances are assessed via a simulation study and reference datasets, showing improvements in clustering recovery over existing approaches

arXiv.org e-Print Archive

Mixture modeling via vectors of normalized independent finite point processes

Author: Argiento Raffaele
Camerlenghi Federico
Colombi Alessandro
Paci Lucia
Publication venue
Publication date: 31/10/2023
Field of study

Statistical modeling in presence of hierarchical data is a crucial task in Bayesian statistics. The Hierarchical Dirichlet Process (HDP) represents the utmost tool to handle data organized in groups through mixture modeling. Although the HDP is mathematically tractable, its computational cost is typically demanding, and its analytical complexity represents a barrier for practitioners. The present paper conceives a mixture model based on a novel family of Bayesian priors designed for multilevel data and obtained by normalizing a finite point process. A full distribution theory for this new family and the induced clustering is developed, including tractable expressions for marginal, posterior and predictive distributions. Efficient marginal and conditional Gibbs samplers are designed for providing posterior inference. The proposed mixture model overcomes the HDP in terms of analytical feasibility, clustering discovery, and computational time. The motivating application comes from the analysis of shot put data, which contains performance measurements of athletes across different seasons. In this setting, the proposed model is exploited to induce clustering of the observations across seasons and athletes. By linking clusters across seasons, similarities and differences in athlete's performances are identified

arXiv.org e-Print Archive

Gaussian graphical modeling for spectrometric data analysis

Author: Argiento Raffaele
Codazzi Laura
Colombi Alessandro
Gianella Matteo
Paci Lucia
Pini Alessia
Publication venue
Publication date: 03/07/2021
Field of study

Motivated by the analysis of spectrometric data, we introduce a Gaussian graphical model for learning the dependence structure among frequency bands of the infrared absorbance spectrum. The spectra are modeled as continuous functional data through a B-spline basis expansion and a Gaussian graphical model is assumed as a prior specification for the smoothing coefficients to induce sparsity in their precision matrix. Bayesian inference is carried out to simultaneously smooth the curves and to estimate the conditional independence structure between portions of the functional domain. The proposed model is applied to the analysis of infrared absorbance spectra of strawberry purees

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Dynamic model-based clustering for spatio-temporal data

Author: Finazzi Francesco
Paci Lucia (ORCID:0000-0001-7403-2054)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

In many research fields, scientific questions are investigated by analyzing data collected over space and time, usually at fixed spatial locations and time steps and resulting in geo-referenced time series. In this context, it is of interest to identify potential partitions of the space and study their evolution over time. A finite space-time mixture model is proposed to identify level-based clusters in spatio-temporal data and study their temporal evolution along the time frame. We anticipate space-time dependence by introducing spatio-temporally varying mixing weights to allocate observations at nearby locations and consecutive time points with similar cluster’s membership probabilities. As a result, a clustering varying over time and space is accomplished. Conditionally on the cluster’s membership, a state-space model is deployed to describe the temporal evolution of the sites belonging to each group. Fully posterior inference is provided under a Bayesian framework through Monte Carlo Markov chain algorithms. Also, a strategy to select the suitable number of clusters based upon the posterior temporal patterns of the clusters is offered. We evaluate our approach through simulation experiments, and we illustrate using air quality data collected across Europe from 2001 to 2012, showing the benefit of borrowing strength of information across space and time

PubliCatt

Estimate of overdiagnosis of breast cancer due to mammography after adjustment for lead time. A service screening study in Italy

Author: A Morrison
Adele Traina
AIRT Working Group
Alba Carola Finarelli
B Moller
Claudia Cirilli
Donella Puliti
E Paci
E Paci
Eugenio Paci
Fabio Falcini
Fabrizio Stracci
Guido Miccinesi
H Jonsson
H Weedon-Fekjaer
J McCann
Lucia Mangone
Manuel Zorzi
NE Day
Nereo Segnan
Paola Baldazzi
R Etzioni
Rosario Tumino
S Moss
S Zackrisson
SD Walter
Stefano Ferretti
Stefano Rosso
SW Duffy
The National Centre for Screening Monitoring
Vincenzo De Lisi
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

INTRODUCTION: Excess of incidence rates is the expected consequence of service screening. The aim of this paper is to estimate the quota attributable to overdiagnosis in the breast cancer screening programmes in Northern and Central Italy. METHODS: All patients with breast cancer diagnosed between 50 and 74 years who were resident in screening areas in the six years before and five years after the start of the screening programme were included. We calculated a corrected-for-lead-time number of observed cases for each calendar year. The number of observed incident cases was reduced by the number of screen-detected cases in that year and incremented by the estimated number of screen-detected cases that would have arisen clinically in that year. RESULTS: In total we included 13,519 and 13,999 breast cancer cases diagnosed in the pre-screening and screening years, respectively. In total, the excess ratio of observed to predicted in situ and invasive cases was 36.2%. After correction for lead time the excess ratio was 4.6% (95% confidence interval 2 to 7%) and for invasive cases only it was 3.2% (95% confidence interval 1 to 6%). CONCLUSION: The remaining excess of cancers after individual correction for lead time was lower than 5%

Crossref

Springer - Publisher Connector

PubMed Central

Archivio istituzionale della ricerca - Università di Ferrara

breast screening axillary lymph node status of interval cancers by interval year

Author: Adele Traina
Alessandra Ravaioli
Alfonso Frigerio
Chiara Petrucci
Donella Puliti
Eugenio Paci
Fabio Falcini
Laura Cortesi
Lauro Bucchi
Lucia Mangone
Manuel Zorzi
Marco Petrella
Priscilla Sassoli de Bianchi
Roberto Zanetti
Rosario Tumino
Stefano Ferretti
Vincenzo De Lisi
Publication venue
Publication date: 01/01/2008
Field of study

Abstract The aim of this study was to determine whether the excess risk of axillary lymph node metastases (N+) differs between interval breast cancers arising shortly after a negative mammography and those presenting later. In a registry-based series of pT1a–pT3 breast carcinoma patients aged 50–74years from the Italian screening programmes, the odds ratio (OR) for interval cancers ( n =791) versus the screen-detected (SD) cancers ( n =1211) having N+ was modelled using forward stepwise logistic regression analysis. The interscreening interval was divided into 1–12, 13–18, and 19–24months. The prevalence of N+ was 28% among SD cancers. With a prevalence of 38%, 42%, and 44%, the adjusted (demographics and N staging technique) OR of N+ for cancers diagnosed between 1–12, 13–18, and 19–24months of interval was 1.41 (95% confidence interval 1.06–1.87), 1.74 (1.31–2.31), and 1.91 (1.43–2.54), respectively. Histologic type, tumour grade, and tumour size were entered in turn into the model. Histologic type had modest effects. With adjustment for tumour grade, the ORs decreased to 1.23 (0.92–1.65), 1.58 (1.18–2.12), and 1.73 (1.29–2.32). Adjusting for tumour size decreased the ORs to 0.95 (0.70–1.29), 1.34 (0.99–1.81), and 1.37 (1.01–1.85). The strength of confounding by tumour size suggested that the excess risk of N+ for first-year interval cancers reflected only their higher chronological age, whereas the increased aggressiveness of second-year interval cancers was partly accounted for by intrinsic biological attributes

Archivio istituzionale della ricerca - Università di Ferrara

Open Access Repository

Clinical epigenetics settings for cancer and cardiovascular diseases: real-life applications of network medicine at the bedside

Author: Altucci Lucia
Barabasi Albert-Lazlo
Baumbach Jan
Benincasa Giuditta
Ciardiello Fortunato
Filetti Sebastiano
Glass Kimberly
List Markus
Loscalzo Joseph
Marchese Cinzia
Maron Bradley A.
Napoli Claudio
Paci Paola
Parini Paolo
Petrillo Enrico
Sarno Federica
Silverman Edwin K.
Verrienti Antonella
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Despite impressive efforts invested in epigenetic research in the last 50 years, clinical applications are still lacking. Only a few university hospital centers currently use epigenetic biomarkers at the bedside. Moreover, the overall concept of precision medicine is not widely recognized in routine medical practice and the reductionist approach remains predominant in treating patients affected by major diseases such as cancer and cardiovascular diseases. By its' very nature, epigenetics is integrative of genetic networks. The study of epigenetic biomarkers has led to the identification of numerous drugs with an increasingly significant role in clinical therapy especially of cancer patients. Here, we provide an overview of clinical epigenetics within the context of network analysis. We illustrate achievements to date and discuss how we can move from traditional medicine into the era of network medicine (NM), where pathway-informed molecular diagnostics will allow treatment selection following the paradigm of precision medicine

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Bayesian space-time data fusion for real-time forecasting and map uncertainty

Author: Paci Lucia <1985>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 17/01/2014
Field of study

Environmental computer models are deterministic models devoted to predict several environmental phenomena such as air pollution or meteorological events. Numerical model output is given in terms of averages over grid cells, usually at high spatial and temporal resolution. However, these outputs are often biased with unknown calibration and not equipped with any information about the associated uncertainty. Conversely, data collected at monitoring stations is more accurate since they essentially provide the true levels. Due the leading role played by numerical models, it now important to compare model output with observations. Statistical methods developed to combine numerical model output and station data are usually referred to as data fusion. In this work, we first combine ozone monitoring data with ozone predictions from the Eta-CMAQ air quality model in order to forecast real-time current 8-hour average ozone level defined as the average of the previous four hours, current hour, and predictions for the next three hours. We propose a Bayesian downscaler model based on first differences with a flexible coefficient structure and an efficient computational strategy to fit model parameters. Model validation for the eastern United States shows consequential improvement of our fully inferential approach compared with the current real-time forecasting system. Furthermore, we consider the introduction of temperature data from a weather forecast model into the downscaler, showing improved real-time ozone predictions. Finally, we introduce a hierarchical model to obtain spatially varying uncertainty associated with numerical model output. We show how we can learn about such uncertainty through suitable stochastic data fusion modeling using some external validation data. We illustrate our Bayesian model by providing the uncertainty map associated with a temperature output over the northeastern United States

AMS Tesi di Dottorato

A comparison of statistical methods for estimating individual location densities from smartphone data

Author: Finazzi Francesco
Paci Lucia
Publication venue: Godel Impresiones Digitales S.L.
Publication date
Field of study

Quantifying uncertainty associated with a numerical model output

Author: Alan E. Gelfand
Daniela Cocchi
Lucia Paci
Publication venue: place:Cagliari
Publication date: 01/01/2014
Field of study

Environmental numerical models are deterministic tools widely used to simulate and predict complex systems. However, they are unsatisfying since they do not provide information about the uncertainty associated with their predictions. Conversely, uncertainty assessment of model outputs can be useful to guide environmental agencies in improving computer models. We propose a Bayesian hierarchical model to obtain spatially varying uncertainty associated with a numerical model output. We show how we can learn about such uncertainty through suitable stochastic data fusion modeling using some external validation data. The model is illustrated by providing the uncertainty map associated with a temperature output over the northeastern United States

PubliCatt

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna